{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Morphotagger\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deeppavlov/DeepPavlov/blob/master/docs/features/models/morpho_tagger.ipynb)\n",
    "\n",
    "# Table of contents \n",
    "\n",
    "1. [Introduction to the task](#1.-Introduction-to-the-task)\n",
    "\n",
    "2. [Get started with the model](#2.-Get-started-with-the-model)\n",
    "\n",
    "3. [Models list](#3.-Models-list)\n",
    "\n",
    "4. [Use the model for prediction](#4.-Use-the-model-for-prediction)\n",
    "\n",
    "    4.1. [Predict using Python](#4.1-Predict-using-Python)\n",
    "\n",
    "    4.2. [Predict using CLI](#4.2-Predict-using-CLI)\n",
    "\n",
    "5. [Customize the model](#4.-Customize-the-model)\n",
    "\n",
    "# 1. Introduction to the task\n",
    "\n",
    "Morphological tagging is definition morphological tags, such as case, number, gender, aspect etc. for text tokens.\n",
    "\n",
    "An example:\n",
    "```\n",
    "Я шёл домой по незнакомой улице.\n",
    "```\n",
    "```\n",
    "1\tЯ\tя\tPRON\t_\tCase=Nom|Number=Sing|Person=1\t_\t_\t_\t_\n",
    "2\tшёл\tидти\tVERB\t_\tAspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act\t_\t_\t_\t_\n",
    "3\tдомой\tдомой\tADV\t_\tDegree=Pos\t_\t_\t_\t_\n",
    "4\tпо\tпо\tADP\t_\t_\t_\t_\t_\t_\n",
    "5\tнезнакомой\tнезнакомый\tADJ\t_\tCase=Dat|Degree=Pos|Gender=Fem|Number=Sing\t_\t_\t_\t_\n",
    "6\tулице\tулица\tNOUN\t_\tAnimacy=Inan|Case=Dat|Gender=Fem|Number=Sing\t_\t_\t_\t_\n",
    "7\t.\t.\tPUNCT\t_\t_\t_\t_\t_\t_\n",
    "```\n",
    "\n",
    "The model is based on [BERT for token classification](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForTokenClassification).\n",
    "The model is trained on [Universal Dependencies corpora](https://universaldependencies.org/) (version 2.3).\n",
    "\n",
    "# 2. Get started with the model\n",
    "\n",
    "First make sure you have the DeepPavlov Library installed.\n",
    "[More info about the first installation.](http://docs.deeppavlov.ai/en/master/intro/installation.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q deeppavlov"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before using the model make sure that all required packages are installed running the command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m deeppavlov install morpho_ru_syntagrus_bert"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Models list\n",
    "\n",
    "The table presents comparison of ``morpho_ru_syntagrus_bert`` config with other models on UD2.3 dataset.\n",
    "\n",
    "| Model | Accuracy |\n",
    "| :--- | :---: |\n",
    "| UDPipe | 93.5 |\n",
    "| morpho_ru_syntagrus_bert | 97.6 |\n",
    "\n",
    "# 4. Use the model for prediction\n",
    "\n",
    "## 4.1 Predict using Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from deeppavlov import build_model\n",
    "\n",
    "model = build_model(\"morpho_ru_syntagrus_bert\", download=True, install=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\tЯ\tя\tPRON\t_\tCase=Nom|Number=Sing|Person=1\t_\t_\t_\t_\n",
      "2\tшёл\tшёл\tVERB\t_\tAspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act\t_\t_\t_\t_\n",
      "3\tдомой\tдомой\tADV\t_\tDegree=Pos\t_\t_\t_\t_\n",
      "4\tпо\tпо\tADP\t_\t_\t_\t_\t_\t_\n",
      "5\tнезнакомой\tнезнакомый\tADJ\t_\tCase=Dat|Degree=Pos|Gender=Fem|Number=Sing\t_\t_\t_\t_\n",
      "6\tулице\tулица\tNOUN\t_\tAnimacy=Inan|Case=Dat|Gender=Fem|Number=Sing\t_\t_\t_\t_\n",
      "7\t.\t.\tPUNCT\t_\t_\t_\t_\t_\t_\n",
      "\n",
      "1\tДевушка\tдевушка\tNOUN\t_\tAnimacy=Anim|Case=Nom|Gender=Fem|Number=Sing\t_\t_\t_\t_\n",
      "2\tпела\tпеть\tVERB\t_\tAspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act\t_\t_\t_\t_\n",
      "3\tв\tв\tADP\t_\t_\t_\t_\t_\t_\n",
      "4\tцерковном\tцерковном\tADJ\t_\tCase=Loc|Degree=Pos|Gender=Masc|Number=Sing\t_\t_\t_\t_\n",
      "5\tхоре\tхор\tNOUN\t_\tAnimacy=Inan|Case=Loc|Gender=Masc|Number=Sing\t_\t_\t_\t_\n",
      "6\tо\tо\tADP\t_\t_\t_\t_\t_\t_\n",
      "7\tвсех\tвесь\tDET\t_\tCase=Loc|Number=Plur\t_\t_\t_\t_\n",
      "8\tуставших\tустать\tVERB\t_\tAspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act\t_\t_\t_\t_\n",
      "9\tв\tв\tADP\t_\t_\t_\t_\t_\t_\n",
      "10\tчужом\tчужом\tADJ\t_\tCase=Loc|Degree=Pos|Gender=Masc|Number=Sing\t_\t_\t_\t_\n",
      "11\tкраю\tкрай\tNOUN\t_\tAnimacy=Inan|Case=Loc|Gender=Masc|Number=Sing\t_\t_\t_\t_\n",
      "12\t.\t.\tPUNCT\t_\t_\t_\t_\t_\t_\n"
     ]
    }
   ],
   "source": [
    "sentences = [\"Я шёл домой по незнакомой улице.\", \"Девушка пела в церковном хоре о всех уставших в чужом краю.\"]\n",
    "for parse in model(sentences):\n",
    "    print(parse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2 Predict using CLI\n",
    "\n",
    "You can also get predictions in an interactive mode through CLI (Сommand Line Interface)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python -m deeppavlov interact morpho_ru_syntagrus_bert -d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`-d` is an optional download key (alternative to `download=True` in Python code). The key `-d` is used to download the pre-trained model along with embeddings and all other files needed to run the model.\n",
    "\n",
    "# 5. Customize the model\n",
    "\n",
    "To train **morphotagger** on your own data, you should prepare a dataset in **CoNLL-U format**. The description of **CoNLL-U format** can be found [here](https://universaldependencies.org/format.html#conll-u-format).\n",
    "\n",
    "Then you should place files for training, validation and testing into the ``\"data_path\"`` directory of ``morphotagger_dataset_reader``, change file names in ``morphotagger_dataset_reader`` to your filenames and launch the training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from deeppavlov import train_model\n",
    "\n",
    "train_model(\"<your_morphotagging_config_name>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or **using CLI**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python -m deeppavlov train <your_morphotagging_config_name>"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}
